Search CORE

3 research outputs found

Bayesian-based techniques for tracking multiple humans in an enclosed environment

Author: Ata ur-Rehman (7200926)
Publication venue
Publication date: 01/01/2014
Field of study

This thesis deals with the problem of online visual tracking of multiple humans in an enclosed environment. The focus is to develop techniques to deal with the challenges of varying number of targets, inter-target occlusions and interactions when every target gives rise to multiple measurements (pixels) in every video frame. This thesis contains three different contributions to the research in multi-target tracking. Firstly, a multiple target tracking algorithm is proposed which focuses on mitigating the inter-target occlusion problem during complex interactions. This is achieved with the help of a particle filter, multiple video cues and a new interaction model. A Markov chain Monte Carlo particle filter (MCMC-PF) is used along with a new interaction model which helps in modeling interactions of multiple targets. This helps to overcome tracking failures due to occlusions. A new weighted Markov chain Monte Carlo (WMCMC) sampling technique is also proposed which assists in achieving a reduced tracking error. Although effective, to accommodate multiple measurements (pixels) produced by every target, this technique aggregates measurements into features which results in information loss. In the second contribution, a novel variational Bayesian clustering-based multi-target tracking framework is proposed which can associate multiple measurements to every target without aggregating them into features. It copes with complex inter-target occlusions by maintaining the identity of targets during their close physical interactions and handles efficiently a time-varying number of targets. The proposed multi-target tracking framework consists of background subtraction, clustering, data association and particle filtering. A variational Bayesian clustering technique groups the extracted foreground measurements while an improved feature based joint probabilistic data association filter (JPDAF) is developed to associate clusters of measurements to every target. The data association information is used within the particle filter to track multiple targets. The clustering results are further utilised to estimate the number of targets. The proposed technique improves the tracking accuracy. However, the proposed features based JPDAF technique results in an exponential growth of computational complexity of the overall framework with increase in number of targets. In the final work, a novel data association technique for multi-target tracking is proposed which more efficiently assigns multiple measurements to every target, with a reduced computational complexity. A belief propagation (BP) based cluster to target association method is proposed which exploits the inter-cluster dependency information. Both location and features of clusters are used to re-identify the targets when they emerge from occlusions. The proposed techniques are evaluated on benchmark data sets and their performance is compared with state-of-the-art techniques by using, quantitative and global performance measures

Loughborough University Institutional Repository

Convolutive speech separation by combining probabilistic models employing the interaural spatial cues and properties of the room assisted by vision

Author: Ata ur-Rehman (7200926)
Jonathon Chambers (1251609)
Mohsen Naqvi (1252812)
Muhammad Salman Khan (7202543)
Yanfeng Liang (7202699)
Publication venue
Publication date: 01/01/2012
Field of study

In this paper a new combination of the model of the interaural spatial cues and a model that utilizes spatial properties of the sources is proposed to enhance speech separation in reverberant environments. The algorithm exploits the knowledge of the locations of the speech sources estimated through vision. The interaural phase difference, the interaural level difference and the contribution of each source to all mixture channels are each modeled as Gaussian distributions in the time-frequency domain and evaluated at individual time-frequency points. An expectation-maximization (EM) algorithm is employed to refine the estimates of the parameters of the models. The algorithm outputs enhanced time-frequency masks that are used to reconstruct individual speech sources. Experimental results confirm that the combined video-assisted method is promising to separate sources in real reverberant rooms

Loughborough University Institutional Repository

Video-aided model-based source separation in real reverberant rooms

Author: Ata ur-Rehman (7200926)
Jonathon Chambers (1251609)
Mohsen Naqvi (1252812)
Muhammad Salman Khan (7202543)
Wenwu Wang (4352767)
Publication venue
Publication date: 01/01/2013
Field of study

Source separation algorithms that utilize only audio data can perform poorly if multiple sources or reverberation are present. In this paper we therefore propose a video-aided model-based source separation algorithm for a two-channel reverberant recording in which the sources are assumed static. By exploiting cues from video, we first localize individual speech sources in the enclosure and then estimate their directions. The interaural spatial cues, the interaural phase difference and the interaural level difference, as well as the mixing vectors are probabilistically modeled. The models make use of the source direction information and are evaluated at discrete timefrequency points. The model parameters are refined with the wellknown expectation-maximization (EM) algorithm. The algorithm outputs time-frequency masks that are used to reconstruct the individual sources. Simulation results show that by utilizing the visual modality the proposed algorithm can produce better timefrequency masks thereby giving improved source estimates. We provide experimental results to test the proposed algorithm in different scenarios and provide comparisons with both other audio-only and audio-visual algorithms and achieve improved performance both on synthetic and real data. We also include dereverberation based pre-processing in our algorithm in order to suppress the late reverberant components from the observed stereo mixture and further enhance the overall output of the algorithm. This advantage makes our algorithm a suitable candidate for use in under-determined highly reverberant settings where the performance of other audio-only and audio-visual methods is limited

Loughborough University Institutional Repository

Surrey Research Insight